In the last few posts, we've focused heavily on matrices and their applications. In this post, we're going to use matrices to learn about Kernels.
The Kernel of a function is the set of points that the function sends to 00. Amazingly, once we know this set, we can immediately characterize how the matrix (or linear function) maps its inputs to its outputs.
I hope that by the end of this post you will:
Understand what a Kernel of a function is and how it helps us understand a function better.
Realize that the inverses of output points are always some translation of the Kernel (for linear functions).
See that there are many pretty patterns and coincidences that flow out of the properties of linear functions.
Functions Across Spaces
In previous posts, we noticed how matrices are just linear functions. We found that the matrices we studied just rotate or stretch a vector in some way.
An example of a function that stretches its input. Source: http://developer.apple.com.
But we only studied square matrices (i.e. 2x2 or 3x3 matrices). What happens when our matrices aren’t square?
Let’s start with a 2x3 matrix that represents a linear function ff. Let's study a function ff defined by the matrix FF:
What happens if I apply this matrix on a vector v=[[3],[2],[1]]v = \begin{bmatrix} 3 \\ 2 \\ 1 \end{bmatrix}?
Let's find out:
{:[Fv=[[2,1,0],[0,1,3]][[3],[2],[1]]],[Fv=[[8],[5]]]:}\begin{aligned}
F v &= \begin{bmatrix} 2 & 1 & 0 \\ 0 & 1 & 3 \end{bmatrix} \begin{bmatrix} 3 \\ 2 \\ 1 \end{bmatrix} \\
F v &= \begin{bmatrix} 8 \\ 5 \end{bmatrix}
\end{aligned}
In other words, f([[3],[2],[1]])=[[8],[5]])f(\begin{bmatrix} 3 \\ 2 \\ 1 \end{bmatrix}) = \begin{bmatrix} 8 \\ 5 \end{bmatrix}).
So ff effectively takes a point in 3 Dimensions, ([[3],[2],[1]]\begin{bmatrix}3 \\ 2 \\ 1\end{bmatrix}) and sends it to a point in 2 Dimensions ([[8],[5]]\begin{bmatrix} 8 \\ 5 \end{bmatrix}). We can see this below:
ff maps points from R^(3)R^3 to R^(2)R^2.
We interact with functions that take points from 3D to 2D all the time. For instance, everytime you take a picture with a camera you are taking a 3D space (the world you see), and collapsing it onto a 2D space (the camera sensor).
A camera converts a 3D object (the tree) into a 2d representation (image). Source: Wikipedia.
The input space for ff is R^(3)R^3 and the output space is R^(2)R^2. More formally, we write this as:
f:R^(3)rarrR^(2)f: R^3 \rightarrow R^2
Losing Information
Returning to the example of cameras, when we take pictures, we squash a 3D world onto a 2D sensor. In the process, we lose some amount of information primarily related to depth.
Specifically, points that are far away will appear close to each other even though they may be quite distant.
Eventually, points infinitely far away on the horizon all collapse onto the same point. We can see this in the example image below.
Notice how the pillars above appear closer and closer together even though they are staying the same distance apart. We have lost information about the distance between this pillars when converting to 2D. Image source: Unsplash.
So just like a camera, will our function also "lose" some information when it moves points from 3D to 2D? Will it collapse multiple points from the input to the same point in the output?
The Kernel - Set of Points that Map to 0
To answer this, let's start by seeing all the points v=[[v_(1)],[v_(2)],[v_(3)]]v = \begin{bmatrix} v_1 \\ v_2 \\ v_3 \end{bmatrix} that map onto the origin of the output space - [[0],[0]]\begin{bmatrix} 0 \\ 0 \end{bmatrix}. This gives us a good starting point for understanding which points from our input hit the same point in the output.
Which points map to [[0],[0]]\begin{bmatrix}0 \\ 0\end{bmatrix}?
Carrying out this multiplication, we see this is satisfied when:
2x+y=02x + y = 0
y+3z=0y + 3z = 0
Solving this for each variable in terms of yy, we find:
{:[x=-(y)/(2)],[z=-(y)/(3)]:}\begin{aligned}
x &= -\frac{y}{2} \\
z &= -\frac{y}{3}
\end{aligned}
So, f^(-1)([[0],[0]])f^{-1}(\begin{bmatrix} 0 \\ 0\end{bmatrix}) is a line parameterized by:
{:[x=-(t)/(2)],[y=t],[z=-(t)/(3)]:}\begin{aligned}
x &= -\frac{t}{2} \\
y &= t \\
z &= -\frac{t}{3} \\
\end{aligned}
for some t in Rt \in R.
This line is shown below. Some points on this line are: v_(1)=[[0],[0],[0]]v_1 = \begin{bmatrix} 0 \\ 0 \\ 0 \end{bmatrix}, v_(2)=[[-3],[6],[-2]]v_2 = \begin{bmatrix} -3 \\ 6 \\ -2 \end{bmatrix}, v_(3)=[[-4.5],[9],[-3]]v_3 = \begin{bmatrix} -4.5 \\ 9 \\ -3 \end{bmatrix}.
The Kernel of ff is the set of points that ff maps to 00. In this case, it forms a line.
There’s a term for this set - the Kernel. We call it the Kernel because this set of vectors maps to the origin, or the core, of the output space.
In our specific case, the Kernel is a line. When ff maps this line to the point 00, we lose information about the line - in the output space, the points on the line are no longer distinguishable.
Returning to our camera analogy, this is similar to how all points on the horizon are no longer distinguishable after the conversion from 3D to 2D. Thus, you can think of the Kernel as a quick way to see how the function compresses or loses information.
Terminology Aside
Let's get some more quick terminology out of the way before proceeding. We're going to use the following terms:
Image - the set of outputs of the function (i.e. everything in f(x)f(x)). The image of a point xx is just f(x)f(x).
Pre-Image - the set of inputs for the function (i.e. the xx in f(x)f(x)). The pre-image of a point yy is just f^(-1)(y)f^{-1}(y).
In the above example, the function ff maps the oval (Pre-Image) on the left to the point (Image) on the right.
Translations of the Kernel - Mapping to [[1],[1]]\begin{bmatrix} 1 \\ 1 \end{bmatrix}
We found the set of points that map to [[0],[0]]\begin{bmatrix} 0 \\ 0 \end{bmatrix} (i.e. the pre-image of the origin). We call this set the Kernel or KK for short.
Can we now similarly find the set of points that map to [[1],[1]]\begin{bmatrix} 1 \\ 1 \end{bmatrix}?
We're going to do this to show something really cool:
Once you know the pre-image of [[0],[0]]\begin{bmatrix} 0 \\ 0 \end{bmatrix}, it's super simple to find the pre-image of [[1],[1]]\begin{bmatrix} 1 \\ 1 \end{bmatrix} or any other point for that matter.
Finding the pre-image
Let's start by finding the points that maps to [[1],[1]]\begin{bmatrix} 1 \\ 1 \end{bmatrix} as before.
{:[Fv=[[1],[1]]],[[[2,1,0],[0,1,3]][[x],[y],[z]]=[[1],[1]]]:}\begin{aligned}
F v &= \begin{bmatrix} 1 \\ 1 \end{bmatrix} \\
\begin{bmatrix} 2 & 1 & 0 \\ 0 & 1 & 3 \end{bmatrix} \begin{bmatrix} x \\ y \\ z \end{bmatrix} &= \begin{bmatrix} 1 \\ 1 \end{bmatrix}
\end{aligned}
Solving for each variable, we find that this is just the line defined by:
{:[x=(1-t)/(2)],[y=t],[z=(1-t)/(3)]:}\begin{aligned}
x &= \frac{1-t}{2} \\
y &= t \\
z &= \frac{1-t}{3}
\end{aligned}
for some t in Rt \in R.
f^(-1)([[1],[1]])f^{-1}(\begin{bmatrix}1 \\ 1 \end{bmatrix}) is the set of points that ff maps to [[1],[1]]\begin{bmatrix} 1 \\ 1 \end{bmatrix}. This is also a line. Notice how similar it is to the line for KK.
This line looks awfully similar to the line for KK doesn't it?
Let's see them both on the same graph. Notice that they're parallel to each other!
Notice that f^(-1)([[1],[1]])f^{-1}(\begin{bmatrix}1 \\ 1\end{bmatrix}) is parallel to KK. In other words, f^(-1)([[1],[1]])f^{-1}(\begin{bmatrix}1 \\ 1\end{bmatrix}) is just KK shifted over.
Translating the Kernel
So what's the relation between the two lines we plotted above - f^(-1)([[1],[1]])f^{-1}(\begin{bmatrix}1 \\ 1\end{bmatrix}) and K=f^(-1)([[0],[0]])K = f^{-1}(\begin{bmatrix} 0 \\ 0 \end{bmatrix})?
f^(-1)([[1],[1]])f^{-1}(\begin{bmatrix} 1 \\ 1 \end{bmatrix}) is just a translation of KK.
It is a translation by any vector v inf^(-1)([[1],[1]])v \in f^{-1}(\begin{bmatrix}1 \\ 1\end{bmatrix}).
Adding KK to vv gives us the full pre-image, f^(-1)([[1],[1]])f^{-1}(\begin{bmatrix} 1 \\ 1 \end{bmatrix}).
Or said another way,
{:f^(-1)([[1],[1]])=v+K", for any "(v inf^(-1)([[1],[1]])):}\begin{aligned}
f^{-1}(\begin{bmatrix}1 \\ 1\end{bmatrix}) = v + K & \text{, for any $v \in f^{-1}(\begin{bmatrix}1 \\ 1\end{bmatrix})$} \\
\end{aligned}
This seems kind of too good to be true. Is it? Let's test it out!
Let's take a point k in Kk \in K. For instance, k=[[-3],[6],[2]]k = \begin{bmatrix} -3 \\ 6 \\ 2 \end{bmatrix}.
Let's take a v inf^(-1)([[1],[1]])v \in f^{-1}(\begin{bmatrix} 1 \\ 1 \end{bmatrix}) like v=[[0],[1],[0]]v = \begin{bmatrix} 0 \\ 1 \\ 0 \end{bmatrix}.
So it is indeed the case here that f(k+v)f(k+v) is [[1],[1]]\begin{bmatrix}1 \\1 \end{bmatrix}!
All Translations of the Kernel are Pre-Images
Ok there's something kind of mind blowing going on here:
We took one point in f^(-1)([[1],[1]])f^{-1}(\begin{bmatrix} 1 \\ 1 \end{bmatrix}).
We added KK to it.
And suddenly we got ALL of f^(-1)([[1],[1]])f^{-1}(\begin{bmatrix} 1 \\ 1 \end{bmatrix})!
In fact this is true more generally!
If you give me ANY point that maps to some point f(v)f(v), say vv, then I can find ALL the points that map to f(v)f(v) by just adding v+Kv+K.
Breaking Down Why
Let's break the above statement down into two parts.
First, we're saying that given some vv, all points in the set v+Kv+K will map to the same place as vv (i.e. f(v)=f(v+K)f(v) = f(v+K)).
Next, these are ALL the points that map to f(v)f(v). Or, every point that maps to f(v)f(v) must be in the set v+Kv+K.
The first statement (left hand side) simply states that all points on the line v+Kv+K must map to f(v)f(v). The second statement says that if a point maps to f(v)f(v), like AA, and BB above, then they must also fall on the line v+Kv+K.
Let's prove each of the above statements more formally, starting with the first.
1. all points in the set v+Kv+K will map to the same place as vv
A more formal way of saying this is:
f(v+K)=f(v)", for any "v inR^(3)f(v+K) = f(v) \text{, for any } v \in R^3
Let's break down why this is true. Take any k in Kk \in K (in the kernel). Then,
{:[f(v+k)=f(v)+f(k)"Since "f" is a linear function"],[f(v+k)=f(v)+0"Since "k in K],[f(v+k)=f(v)]:}\begin{aligned}
f(v+k) &= f(v) + f(k) & \text{Since $f$ is a linear function} \\
f(v+k) &= f(v) + 0 & \text{Since } k \in K\\
f(v+k) &= f(v)
\end{aligned}
The below video shows this visually.
By decomposing v+kv+k into vv, and kk, we see that f(v+k)=f(v)f(v+k) = f(v).
Additionally, given this is true for some v+kv+k, this is true for all points on the line v+Kv+K. The reason is that the different amounts of KK all contribute nothing different and it's only the value of vv that matters to ff. This is shown below:
Any point in KK, such as kk, k^(')k', and k^(″)k'', does not change the result of ff. Hence, f(v+K)=f(v)f(v+K) = f(v).
Let's now move to the next statement.
2. Every point that maps to f(v)f(v) must be in the set v+Kv+K
Essentially, this is saying that there can be no point ww such that ww maps to f(v)f(v) but is not in v+Kv+K.
Let's prove this.
Choose any v,wv, w such that f(v)=v^(')f(v) = v' and f(w)=v^(')f(w) = v'. We wish to show that w in v+Kw \in v+K.
Hence w^(')in Kw' \in K (as all points that map to 00 are in KK).
Thus, v+w^(')in v+Kv + w' \in v +K.
Since v+w^(')=v+w-v=wv+w' = v + w - v = w, w in v+Kw \in v +K.
So we've successfully proved our two points!
The Relation Between Translations of KK and Points in the Image
We've already seen something really cool - every translation of KK is the full pre-image of a point in ff.
Now is there any relation between how far apart two translations of KK are (say v+Kv+K and w+Kw+K) and how far apart their images are (f(v)f(v), f(w)f(w))?
Yes!
If v+Kv+K and w+Kw+K are v^(')v' apart, their images will be f(v^('))f(v') apart!
If v+Kv+K and w+Kw+K are v^(')v' apart, their images will be f(v^('))f(v') apart.
Why is this the case? It again follows pretty simply:
Let's now take a step back and view what's happening in the overall space.
Every point in the image can be seen as the image of some translation of KK. As we move KK around, we get new points in the image!
Conclusion
We've now seen some really cool things that you may not have noticed before:
Every matrix is a linear function and that linear function will have some kernelKK that maps to 00.
All pre-images of output points are just going to be translations of KK.
If v^(')v' is the distance between the translations of KK, f(v^('))f(v') is the distance between their images.
The last point actually leads us to the first isomorphism theorem of group theory. This broadly states that the relation between the sets of pre-images of a special type of function known as a homomorphism (in our case ff) is the exact same as the relation between the set of output points (we'll go into this in the next blog post!).
There are many practical uses of this knowledge but I wanted to share it for simpler reasons. Sometimes math is just pretty - it has all these cool properties that fit together so nicely that you can't help but enjoy seeing them.
For example:
Who would have thought that all the pre-image sets are just translations of each other?
Or that the relation between these pre-image sets mirrors the relation between the points in the image?
I hope you enjoyed getting a taste of some abstract algebra and I'll see you in the next post!